IASSNS-HEP-00/79 
Entanglement purification of unknown quantum states 



Todd A. Brun, 1 '* Carlton M. Caves, 2 and Riidiger Schack 3 
1 Physics Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA 
2 Department of Physics and Astronomy, University of New Mexico, 
Albuquerque, New Mexico 87131-1156, USA 
^Department of Mathematics, Royal Holloway, University of London, Egham, Surrey TW20 OEX, 

UK 

(9 October 2000) 



Abstract 



A concern has been expressed that "the Jaynes principle can produce fake 
entanglement" [R. Horodecki et al, Phys. Rev. A 59, 1799 (1999)]. In this 
paper we discuss the general problem of distilling maximally entangled states 
from N copies of a bipartite quantum system about which only partial infor- 
mation is known, for instance in the form of a given expectation value. We 
point out that there is indeed a problem with applying the Jaynes principle 
of maximum entropy to more than one copy of a system, but the nature of 
this problem is classical and was discussed extensively by Jaynes. Under the 
additional assumption that the state p( N ' of the iV copies of the quantum sys- 
tem is exchangeable, one can write down a simple general expression for p( N ' . 
We show how to modify two standard entanglement purification protocols, 
one-way hashing and recurrence, so that they can be applied to exchangeable 
states. We thus give an explicit algorithm for distilling entanglement from an 
unknown or partially known quantum state. 
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I. INTRODUCTION 



Entanglement is a quantum-mechanical resource that can be used for a number of tasks, 
including quantum teleportation, quantum cryptography, and quantum dense coding. Since 
real quantum channels are noisy, it is very difficult to create perfect entanglement directly 
between two distant parties. There is thus a need to purify (or distill) partial entanglement. 
Suppose two parties share N pairs of qubits such that each pair is in the same entangled, 
but mixed state p, the total state p( N ' of all N pairs thus being the iV-fold tensor product 
pi N ) = p® N = p eg ... <g) p. There exist protocols |]-|4[] , using only local operations and 
classical communication, which allow the two parties to transform M < N of the pairs into 
maximally entangled states, for instance singlet states. In the limit N — > oo, the fidelity of 
the singlets approaches 1 and the fraction M/N a fixed limit, called the asymptotic yield. 

In this paper, we consider the more general case in which the initial state 

pW is 

not 

a tensor-product state. This corresponds to the realistic situation that the state p of each 
individual pair is not perfectly known, for instance because one of the particles has been sent 
through a channel with only partially known characteristics. In Sees. |T| and [IV], we apply 
the entanglement purification methods known as one-way hashing and recurrence [EM 
to partially known, including completely unknown, quantum states. It turns out that the 
generalization of the recurrence method is straightforward, whereas the hashing method as it 
is described in Ref. [fj depends on the initial state being of tensor-product form and therefore 
requires a more careful analysis. Unlike Briegel et al. ||, who have studied entanglement 
purification with imperfect quantum operations, we assume that all operations are error-free. 
A paper related to ours is Ref. || by Eisert et al, who study how distillable entanglement 
decreases when information about a quantum state is lost. 

Before we turn to the actual entanglement purification protocols, we discuss, in Sec. ||, 
the problem of what density operator p^ to assign to N pairs of qubits if only partial in- 
formation is available. This is an unsolved problem, and we do not attempt to give a general 
solution. We show, however, that under the additional assumption of exchangeability the 
state p( N > must have a certain simple form, which is amenable to entanglement purification. 
Our discussion also provides a resolution of the apparent paradox found by Horodecki et al. 
0, who give an example where applying the Jaynes principle of maximum entropy leads 
to a state with more distillable entanglement than seems to be warranted by the available 
information. We conclude in Sec. |V[ 

II. STATE ASSIGNMENT BASED ON PARTIAL INFORMATION 

Let us consider the example given by Horodecki et al. JjJ]. The authors consider a system 
composed of a single pair of qubits and define an operator 

B= Ua x ®a x + a z ®a z ) = ($+-*_) , (1) 

where ty± = \^±)(^±\, <fr± = |3>±)(3>±| are projectors onto the Bell states, 

|<& ± > = ^(|00)±|11)), 
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|*±> = ^(|01>±|10)). (2) 

Our definition of B differs from that of Ref. @ by a constant factor to simplify the ex- 
pressions. If all that is known about the system state is the expectation value (B) = 1/2, 
then Jaynes's principle of maximum entropy stipulates that one should assign the state of 
maximum von Neumann entropy compatible with the constraint (B) = 1/2, which in this 
case is 

pj = + Ye*- + ^(*+ + *-) • ( 3 ) 

This state has distillable entanglement. Horodecki et al. [|7| point out that the state 

p H = + ^(*+ + $ ) . (4) 

also satisfies the constraint (B) = 1/2, but is separable and hence unentangled. They 
conclude that the entanglement in the maximum entropy state pj is "fake," because it 
violates the condition that an inference scheme "should not give us an inseparable estimated 
state if only theoretically there exists a separable state consistent with the measured data." 
As an alternative to the Jaynes principle, they propose first to minimize the entanglement 
and then to find the state of maximum entropy among those states that have minimal 
entanglement. For the constraint (B) = 1/2, this alternative scheme results in the state pu 
given above. 



A simple defense of the Jaynes principle would be the following (see also Refs. [^0| .|TT|]). 
The alternative procedure proposed by Horodecki et al. assumes additional information 
about the two qubits, namely that entanglement is a priori unlikely. This would be rea- 
sonable, e.g., in a situation where the parties know that the state has been prepared by 
an adversary whose objective is to let them have as little entanglement as possible. But 
then more is known about the state than just the given expectation value, and hence the 
assumptions behind the Jaynes procedure are not fulfilled. 

If there is no specific additional information, however, the maximum entropy state assign- 
ment pj is preferable to the minimum entanglement assignment pu- Indeed, if a projective 
measurement in the Bell basis is performed, assigning pu corresponds to assigning zero prob- 
ability to the measurement outcome an outcome that is not ruled out by the constraint 
(B) = 1/2. In this sense, the minimum entanglement assignment is inconsistent with the 
prior information. By contrast, no inconsistency of this kind can arise from the maximum 
entropy assignment in the absence of prior information beyond the given expectation value. 
Since no measurement of a single system can tell if that system is entangled or not, the 
prediction of "fake entanglement" for pj can cause no difficulty In particular, there is no 
way to turn a single pj state into a maximally entangled state even probabilistically [|l^ . 



We now turn to the case in which the parties share not just one, but N qubit pairs. We 
denote by p^ the total state of the N pairs and assume that the N pairs are known to satisfy 
the constraints (B) k = Tr(p k B) = 1/2 for k = 1, . . . , N, where p k is the reduced density 
operator of the k-th. pair. In this case, the state assignment p^ = p® N is not supported by 
the prior information, even though this is the state of maximum entropy compatible with 
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the given expectation values. For large N, this state assignment corresponds to the definite 
prediction that a nonzero number of perfect singlets can be distilled, which is certainly 
not implied by the given expectation values. The alternative state assignment p( N > = p% N 
would, however, be equally unsupported by the prior information. It corresponds to the 
definite prediction that no singlets can be distilled from the N pairs, which is the minimum 
number of distillable singlets compatible with the a priori knowledge. Although this is a 
very cautious prediction, it is also not implied by the given expectation values. 

The fact that a naive application of the principle of maximum entropy to many copies 
of a system fails is essentially of classical origin and is not unique to problems involving 
entanglement. Jaynes |jTB[ has given a thorough discussion of this problem, which can be 
explained by a simple example. Consider a possibly loaded die. All that is known about the 
die is the mean value (n) = J2n np(n) = 3.5, where p(n) is the probability of the outcome n, 
n = 1, . . . , 6. The probability distribution of maximum entropy compatible with the given 
mean- value constraint is p(n) = 1/6 for n = 1, ... ,6. Now consider throwing the die N 
times. A naive application of the maximum entropy principle would predict that the N 
dice throws were independent and identically distributed (i.i.d.) according to the single-trial 
distribution p(n). This would lead to the prediction that the fraction of throws showing any 
particular outcome would approximate 1/6 with arbitrary precision as iV tended to infinity. 
This prediction, however, is not implied by the prior knowledge, which is compatible with 
many possible outcome sequences, including sequences in which only the events n — 1 and 
n = 6 ever occur — quite possible, if the die is loaded. Moreover, with an i.i.d. distribution, 
the results of earlier throws imply nothing about the probability of later outcomes. Even 
the most gullible gambler might become suspicious if 1 and 6 were the only outcomes after 
thousands of throws. 

In Ref. [ I3"| , Jaynes discusses how to choose the multi-trial distribution in the classical 



case. The starting point of his discussion is the assumption that the probability distribution 
of the N dice throws is exchangeable. The same assumption is the starting point for our 
quantum analysis. If exchangeability is assumed, the task of assigning a state of iV qubit 
pairs compatible with the constraints given above is much simplified. A state of iV copies 
of a system is exchangeable if it is a member of an exchangeable sequence p^ k \ k — 1,2, ... . 
An exchangeable sequence is defined by 

(i) p^ = Tr fe+1 p( fc+1 ^ for all k, where Tr fc+1 denotes the partial trace over the (k + l)th 
system, and 

(ii) each p^ is invariant under permutations of the k systems on which it is defined. 



This definition is the quantum generalization of de Finetti's []14| definition of exchangeable 
sequences of classical random variables. 

A state pW is exchangeable if and only if it can be written in the form 

P W = f dpp(p)p® N , (5) 



where dp is a measure on density operator space, and p(p) is a normalized generating func- 
tion, / dpp(p) = 1. This is a consequence of the quantum de Finetti theorem, the quantum 
version of the fundamental representation theorem due to de Finetti [fL4|J. The quantum 
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theorem was first proved by Hudson and Moody |15] after pioneering work by St0rmer [16] : 
for an elementary proof see Ref. |l 



How, in general, do we pick p(p)dp? To our knowledge, there is no universal rule for 
this task, although there exist a number of proposals for unbiased measures dp on density 



operator space [|lq-|2T||. These can be interpreted as proposals for state assignments for 
N systems under the sole assumption of exchangeability, i.e., using a generating function 
p(p) = 1. 

If, in addition to exchangeability, there is a mean-value constraint (O) = o, the naive 
Jaynes maximum entropy state assignment leads to a generating function of the form p(p) = 
5(p — pj), where pj is the single-system state of maximum entropy, subject to the constraint; 
this generating function is unacceptable for the reasons given above. A good choice of p(p)dp 
should be nonzero for all p that are compatible with the prior information — we should never 
arbitrarily rule out any possibility. Similarly, p(p)dp ought to vanish for any p which is 
actually ruled out by the prior information. We therefore would expect a multi-system 
generalization of Jaynes's maximum entropy procedure to have the form 

Pmaxent 

(p)dp = M5[o-Tv(Op)]f(p)dp, (6) 

where M is a normalization constant and f(p)dp is strictly positive. The exact form of the 
function f(p) and of the measure dp is the subject of ongoing research. In the spirit of 
the single-system Jaynes principle, p(p)dp should favor states p with higher von Neumann 



entropy S(p) and should give the usual pj when N = 1 f22 |. 

Given an initial state assignment of the form (|5|), additional information can be obtained, 
e.g., by making measurements on individual subsystems. Suppose a measurement outcome 
k is represented by a positive single-system operator Fk, with J2k Fk — 1; i- e -> the Fk form a 
positive-operator valued measure (POVM) Given that the subsystem is in state p, the 
probability of getting outcome k is p{k\p) = TrfF^p]. If the total state is given by Eq. (^[), 
the probability of outcome k in a measurement on a single subsystem is then 

p k = dpp(p)p(k\p) . (7) 



After the measurement we must update the state of the remaining N — 1 systems by Bayes's 
rule, 

P (N - 1] = J dpp(p\k)p^ N -^ , (8) 

where 

p(p\k) = . (9) 

Pk 

By doing different measurements on several subsystems, we acquire more and more data; if 
these measurements are chosen well, the resulting posterior p pos t(p) becomes more and more 
peaked and has less and less dependence on the choice of prior p(p). This procedure is a 
straightforward Bayesian version of quantum state tomography P^-CT. 
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The condition of exchangeability in combination with the quantum de Finetti theorem 
provides only a partial solution of the problem of state assignment in the presence of partial 
information, but we show in the next two sections that exchangeability alone is sufficient 
to guarantee that the entanglement purification procedures known as one-way hashing and 
recurrence can be carried out. The probability of distilling a positive yield of maximally 
entangled states depends on the exact form of p(p)dp in Eq. @. 

III. ENTANGLEMENT PURIFICATION BY ONE-WAY HASHING 

In this section, we first present a version of the one-way hashing algorithm that proceeds 
by Bayesian updating of the probabilities for products of Bell states and that can in princi- 
ple be applied to general exchangeable states. We then briefly sketch the argument given in 
Ref. @ that for a product state p® N , where p is Bell-diagonal with von Neumann entropy 
S, the asymptotic yield of pure singlets is given by iV(l — S). We show how to modify this 
argument so that it can be applied to general exchangeable states. Finally we give a sim- 
plified Bayesian hashing algorithm for exchangeable states and discuss its asymptotic yield. 
Our analysis is restricted to pairs of qubits, but the method generalizes straightforwardly to 
arbitrary Hilbert space dimensions. 

We restrict attention to Bell-diagonal states, i.e., mixtures of the Bell states, 

= u>i*_ + w 2 *+ + w 3 $_ + tt>4$+ , (10) 

where we denote the weights by w = {^1,^2,^3,^4}, w\ + W2 + W3 + W4 = 1, Wj > 
for j = 1, ... ,4. Most existing entanglement purification procedures begin by making this 
assumption. If it does not hold, it is possible to put any state in this form by "twirling," 
that is, by randomly rotating both spins of an entangled pair. The final yield of maximally 
entangled states cannot be diminished by omitting this step, however, so it is better to 
think of twirling as a conceptual, rather than a physical procedure. After twirling, the 
initial, exchangeable state @ of our N pairs of qubits becomes 

= / dwp(w)pT , (11) 

where 

J dwp(w) = 1 . (12) 
We now define the set of labeled states 

Poo = *- , P01 = *+ , 

pio = *- , Pii = • (13) 

The first bit in the label tells us whether the pair is in a $ or a $ state; the second bit 
tells us whether it is in a + or — state. If we are restricted to local measurements and 
classical communication on a single pair, the best we can do is to determine one of these 
two bits, but not both, and the pair will be left in an unentangled state. Bennett et al. have 
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shown, however, that if we can manipulate the qubits collectively, much more interesting 
measurements are possible [|J. 

The first step is to rewrite the state ( |ITD as a probability distribution over strings of bits, 
with each qubit pair associated with two bits in the string. For this, we define the product 
distribution 

p{i\i 2 - ■ -12n\w) = w ili2 Wi 3 i 4 ■ ■ ■ w i2N _ li2N , (14) 
where woo = u>i, woi = w 2 , W10 = ^3, and wn = w^. Using this notation, 

p (N) = J2 p(iii 2 -..i2N)pi 1 i 2 ®---® pi 2N _ x i 2N , (15) 

11,12, ■■;i2N 

where 

p{i x i 2 . . . i 2N ) = J dwp(w)p(iii 2 ■ ■ ■ i 2N \w) . (16) 

We now select a random subset of the bits i\ . . . i 2 N and list all the qubit pairs which 
have at least one associated bit in the subset. From this list we choose one qubit pair to 
be the target. For each of the other qubit pairs in the list, Alice and Bob both perform one 
of a set of three unitary transformations on their half of the pair, followed by a bilateral 
controlled-NOT onto the target pair. This sequence of operations is equivalent to replacing 
one of the bits of the target pair with the parity of a subset of all the bits. The choice of 
unitary transformation corresponds to including the first, second, or both of the bits from 
a particular pair in the parity calculation. Then a measurement is performed on the target 
pair. (The details of this procedure are given in @.) By carrying out such a procedure, 
one bit of joint information is acquired about all the pairs, at the expense of sacrificing one 
entangled pair (that is, two bits). The unmeasured pairs in general undergo an invertible 
transformation among the Bell states, but they do not become entangled with each other, 
and this transformation can, if one chooses, be undone, leaving the sequence of bits for 
the unmeasured pairs unaltered. Bennett et al. have shown that such a procedure can be 
equivalent to finding the parity of any subset of the 2N bits. This parity bit then allows 
one to update the probability distribution for the remaining 2(iV — l)-bit string. 

Let us examine this in a little more detail. Let 7 = i\% 2 ■ ■ ■ i 2 ^ denote a sequence of bits. 
We can select a subset of these bits by giving another sequence x, which includes a 1 for 
each bit to be included in the subset and a for the rest. The parity of the subset is then 

/27V \ 

7T £ ( I ) = X ■ I = I ^ X ™im ) mod 2 . (17) 
\m=l / 

For a given i the probability of getting a value irg for the parity is either or 1, so the 
probability of getting measurement result is 

pM = 53K 7r d'*M'0 = D'W-*^) • ( 18 ) 

For simplicity let us assume that the target pair is the last, so the last two bits are sacrificed; 
the new state for the N — 1 remaining pairs is 
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p (N i) = ^2p(i'\7r s )p hi2 ® Pi3iA ® ■ ■ ■ pi 2N _ :ji2N _ 2 , (19) 

r 

where i ' = ■ ■ ■ iiN-i and 

K*N= 2^ K*N= 2^ — 17^3 — • ( 20 ) 

i2N-l,i2N «2iV-l,«2JV ^ 

Note that while the initial probability distribution p(t) is symmetric under interchanges of 
the pairs, this symmetry is lost after measurement. 

The purification scheme follows simply from this. One chooses subsets of the bit string at 
random and measures their parity, sacrificing one pair with each measurement, but updating 
the probability distribution for the remaining strings. This procedure is repeated until one 
is left with only a single string, say to, with probability 1 — 5 for some small 5. Written 
more formally, the posterior probability p post at the end of the procedure, conditioned on 
all measurement results, has the property p pos t(^o) = 1 — 5 for some sequence 1q. One then 
knows with high probability the maximally entangled state of each remaining pair, which 
can then be transformed into a standard state (such as \I/_) by local operations. The yield 
of this procedure is the number of entangled pairs left at the end. 

It is clear that there are states for which the yield is zero. The obvious example is a 
state p® N where p is unentangled. For states of the form p% N , Bennett et al. have shown 
that asymptotically the method gives a yield of N{1 — S^) maximally entangled pairs with 
fidelity approaching 1, where 

4 

Sa = - w j log wj = -Tr(p^logp^) (21) 



is the entropy of p^. The argument makes use of the theorem of typical sequences |28 
(which is closely related to Shannon's noiseless coding theorem |2I|), according to which, 
for any e > and S > and sufficiently large N, there exists a subset TYP(iV) of the set of 
all sequences 1 with the following properties: 

p(TYP(JV))= p(i\w)>l-e, (22) 

ieTYP(Ar) 

i.e., the total probability of the set TYP(iV) is arbitrarily close to 1; and 

|TYP(iV)| < 2 N{s ™ +5) , (23) 

i.e., the number of sequences in TYP(iV) is not much larger than 2 a . The set TYP(iV) 
is called the set of typical sequences. Since the parity measurement in each hashing round 
rules out half the typical sequences on average and since essentially all the probability is 
concentrated on the typical sequences, it can be expected that after sacrificing approximately 
NSrf pairs, essentially all the probability is concentrated on a single typical sequence. Clearly 
this leads to a positive yield only if < 1. 

The theorem of typical sequences does not hold in general for sequences corresponding 
to exchangeable states of the form ([□]). To apply the hashing method in this case, we rely 
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on a generalization of the theorem of typical sequences due to Cziszar and Korner |30| (this 
theorem has recently been used by Jozsa et al. to derive a universal quantum information 
compressing scheme). Applied to our setting, the theorem is that, given a fixed entropy S , 
then for any e > and 5 > and sufficiently large N, there exists a subset CK(iV) of the 
set of all sequences 1 with the following properties: 

J2 p{i\w)>l-e (24) 

i£CK(N) 

for all w such that < So, which means that the set CK(iV) is typical for all probability 
distributions with entropy less than So] and 

|CK(A0| < 2 7V(5o+5) , (25) 

i.e., the number of sequences in CK(iV) is not much larger than 2 NS °. In the following, 
when we write "typical sequences," we mean sequences in CK(iV), whereas by "atypical 
sequences" we mean sequences in CK(iV), the complement of CK(iV). 

Now assume that we want to perform the hashing protocol on a state of iV pairs of the 
form (|TT|) with the property 

/ dwp(w) = 7] < 1 (26) 

for some entropy So < 1; i.e., there is only a small a priori probability that the entropy of 
the unknown state exceeds the given value So- (The case of states that do not have this 
property will be discussed at the end of this section.) Furthermore, assume that iV is large 
enough that there exists a Cziszar-Korner set CK(iV) with constants e, 5 1 in Eqs. 
and (|25|) . It then follows that 

p(CK(JV))= £ p(T) 

J dwp{w)p{i\w) 



— ^2 dwp(w)p(i\w 
= / dwp(w) £ p(i\u 

JS a <S ieCK(N) 

> / dwp(w)(l — e) 

J S,r, <Sa 



'S i B<So 

= (l-i7)(l-e) 

> 1 - v - e , (27) 

where Eqs. ([12]), (|2~4f) , and (p6|) have been used. 

We use this inequality, in combination with Eq. (p5]), to derive the asymptotic yield of the 
hashing algorithm applied to exchangeable states. We restrict our analysis to a simplified 
protocol, in which we choose a number r, somewhat larger than iV(So + 5), such that 
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(28) 



We begin with input strings t that have probability p(t). Let h denote a sequence of r 
parity checks on random subsets, and let o — oj, . . . ,o r denote the r-bit string of parity 
checks, or outcomes. Note that we denote all strings of bits as vectors, even though they 
are not all of the same length. The probability distribution p(h) on parity-check sequences 
is weighted uniformly on all sequences. For a given input string i and a given parity-check 
sequence h, the outcome o is determined; we denote this deterministic outcome by o^. 
We can express this deterministic outcome in terms of a probability for outcome string o , 
given parity-check sequence h and input string i: 



Since for each parity-check bit obtained, two bits of the input string are discarded, two 
strings with the same parity check, which differ only on those two bits, become the same 
after that step. After r steps of a parity-check sequence h, there will be only N — r entangled 
pairs, corresponding to a string of 2(N — r) bits. If one starts with a string i, one will be left 
with a shorter substring 1^(1). Different initial strings i that generate the same outcome 
o and lead to the same final substring ih(i) are equivalent for practical purposes. Let us 
denote the set of all input strings i which lead to outcome o and to output substring ih 
by Iflo,i h ) = {i\ o h) t = o,i h (i) = i h }. 

For parity-check sequence h, we are interested in outcomes o such that all typical input 
strings 1 that lead to o produce the same output string l^ij-)- For outcomes where this is 
the case, the procedure picks out a unique output string from among all those that could be 
produced by a typical input string. In this case, we say that we accept the outcome o and 
the corresponding unique output string, which we denote by ih,d- In this way we divide 
the outcomes for a parity-check sequence h into two sets, the set of accepted outcomes, A h , 
and its complement. For an outcome that we accept and for a typical input string, we can 
write the conditional probability (|29|) as 



Though the additional Kronecker delta in this expression is redundant, it reminds one that 
any typical input string i that leads to an accepted outcome o produces output string lh,d- 
Notice that this is not true for atypical input strings: an atypical input string can have 
outcome o and produce outcome string %h,o or a different output string. 

The probability that the outcome is accepted, given input string i and parity-check 
sequence h, is 




(29) 



P (d\h,i,i e CK(iV)) 



f l , if i e 4(0, z M ) 
\ , if i <£ 4(o, z M ) 

<*3,3 M <*T h (?),? h , 3 for o E A h . 



(30) 



p(accept|/z, i) = ^2 p(o\h,~i 
oeA h 



fi°,°h,i 



deA h 




1 , if 1 leads to an accepted outcome, 

, if i does not lead to an accepted outcome. 



(31) 
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Notice that this conditional acceptance probability can be nonzero for atypical input strings. 
The complementary probability, that the outcome is not accepted, given ? and h, is given 
by 

p(accept|/i, i ) = p(o\h, i) 

o£A h 

= E ^o,°h,l 
o£A h 

JO, if i leads to an accepted outcome, 
\ 1 , if i does not lead to an accepted outcome 

If the input string is a typical string, the conditional acceptance probability can also be 
written as 

p(accept|/i, t, % G CK(iV)) = J2 5 o,o h ^t h {t),t hro (33) 

deA h 

[see Eq. 

What we are interested in for the present is the probability to have an outcome that is 
accepted, given a typical input string, but averaged over all parity-check sequences: 

p(accept|¥, i G CK(JV)) = ^p(accept|/i, 1,1 e CK(N))p(h) 

h 

= I>W E s *,Vh,i s ihW,ih,* ■ ( 34 ) 

h o t=A h 

The complementary probability, 

p(accept|¥, i e CK(N)) = ^p(accept|/i, i,i e CK(N))p(h) 

h 

= EpW E So,o hr > . ( 35 ) 

is the average probability not to have an outcome that is accepted, given the typical input 
string i. This probability is the probability that for a random parity-check sequence, the 
typical input string i leads to an outcome that does not pick out a unique output string 
ih,oi i- e - ; does not lead to the only possible output string that could have been produced 
by a typical input string. We can bound this probability in the following way. The number 
of typical sequences satisfies |CK(iV)| < 2 N ^ So+5 \ For parity subsets chosen randomly, the 
probability that two typical input strings, i and j, agree on all r parity checks — i.e., have 
the same outcome — is < 2~ r ; thus the probability that i and j agree on all r parity checks 
and produce different output strings, Tft(T) and ih(j), is < 2~ r . Hence the probability of 
not producing a unique output, given a typical input i, is bounded by 

p(I5cipT| i, i G CK(JV)) < T r x 2 N{So+5) = ( . (36) 

This implies that the conditional acceptance probability fl34j) satisfies 

p(accept|T, % G CK{N)) > 1 - ( . (37) 

11 



(32) 
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Bayes's rule tells us that the posterior probability for output string i^, given h and o, 

p(t h \h,o)= J2 P{AK°) 
tei h (d,t h ) 

y> p(o\h, i)p{h)p{%) 

~te^ h) P(o\h) P (h) 

^ p(o|/i) ' (38) 



i&I h (o,i h ) 



where 



p(o\h) = J2p(o\h,t)p(t) (39) 



is the probability for outcome string o , given parity-check sequence h. 

Given a parity-check sequence h and an accepted outcome o E A h for that sequence, we 
judge the "success" of the accepted output string by the posterior probability, i.e, 

p(success|/i, o) = p( ih,o \h, o) — p(^\h, o) for o G A h . (40) 

The total probability of success, p(success), is obtained by averaging over all parity-check 
sequences h and over all accepted outcomes o G A h . This probability can be manipulated 
in the following ways: 

p(success) = E E £>( success l^> o)p{ o \h)p(h) 

h o eA h 

= E E E p(t\h } d)p(o\h)p(h) 

h deA h i64(o,i Ai s) 

= E E E p(o\h,t)p(h)p(t) 

h oeA h iei h (o,i hiZ ) 

>E E E p{o\h,i)p(h)p(i) 
h deA h iei h (o,i h _ z ) 

i€CK(iV) 

= E E E 5 3,o h ,,^ h (t),^ i3 p(%(«) • (41) 

ft 3eA h i gck(v) 

The inequality here follows from restricting the sum over input strings to typical strings 
and reflects the fact that an atypical string might lead to an accepted outcome and to 
the accepted output string lh,d, thereby contributing to the success probability. The final 
equality comes from using Eq. (|30|) for p(o\h,i). Using Eqs. (p7j), fl3"4|), and fl3~T|), we can 
now bound the probability of success: 

p(success) > K*)Ep( /i ) E 6 o,d hri ^ h ^),i hro 

ieCK(N) h oeA h 
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= p(?)p(accept| z, i e CK(iV)) 

iGCK(AT) 

>(i-0 E p(«) 

ieCK(iV) 

= (1 - CMCK(iV)) 

> (^^(i-^-e) 

> 1 - C - V ~ e . (42) 

This is the desired result. Assuming we can choose arbitrary positive constants e and r\ and 
have sufficiently large N, the probability ( |42| ) can be made arbitrarily close to 1. 

Except for certain singular distributions p(w), given an exchangeable state of the 
form (|H]), it is always possible to make r\ in Eq. arbitrarily small by choosing the 
entropy Sq sufficiently large (0 < Sq < 2); if So > 1, however, then the number of hashing 
rounds r > N, which means there is no yield since N — r < 0. To decrease the value 
of Sq and thereby make the yield positive or increase an already positive yield, one can 
perform quantum state tomography on some of the pairs to obtain more data about the 
state, generally producing a narrower posterior distribution p'(w) (see Sec. |TT|). The width 
of the posterior distribution depends on the number of pairs sacrificed for the tomographic 
measurements, but not on the total number of pairs N. The number of pairs needed for 
tomography can therefore be neglected in the asymptotic limit of large N. 

Asymptotically, the prior probability of obtaining a posterior p'(w) concentrated at w = 
w with an entropy S^ < S is given by the expression 

p(S < S ) = [ ^ dwpiw) , (43) 

where p(w) is the prior distribution (|TTD defining the initial state. Putting everything to- 
gether we see that, for So < 1, p(S < So) is the probability of obtaining an asymptotic yield 
of N(l — So) using a combination of quantum state tomography and one-way hashing. 

If most of the prior distribution p(w) is concentrated on states with an entropy exceeding 
1 bit, i.e., if p(S < 1) is small, then it will normally be a better strategy to precede the 
hashing procedure by a few iterations of the recurrence method. This is the content of the 
next section. 



IV. ENTANGLEMENT PURIFICATION BY RECURRENCE 

If the generating function p(w) has no significant support on weights w with S^ < 1, 
then hashing cannot be used for entanglement purification, at least initially. It might still 
be possible, however, to distill some entanglement by using the more robust (but far more 
wasteful) technique of recurrence ||,[|. 

In the recurrence algorithm, an initial set of 2N entangled qubit pairs is grouped into 
N sets of 2 pairs each. In each set, one pair is designated the target pair, and the other the 
control pair. Alice and Bob thus have N target qubits and N control qubits each. Alice 
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now rotates all her qubits by 7r/2 about the x axis, while Bob rotates all his qubits by 
— 7r/2 about the x axis. Each of them then performs a controlled-NOT operation from each 
control qubit onto the corresponding target qubit and measures his or her target qubit in 
the z basis (|0) and |1)). The target qubits are then discarded. If Alice and Bob both get 
the same result for a given target pair (i.e., both or both 1), the procedure has succeeded, 
and the control pair can be shown to have increased entanglement. If their results differ, 
the procedure has failed, and the control qubits must also be discarded. 

If the state of both target and control pairs is of form (|10| ), the probability of success is 

Ps = Ps(w) = (wx + w 4 ) 2 + (w 2 + w 3 ) 2 , (44) 

and the new state of the control pair after the measurement has weights 

w[ = 2w 2 w 3 /p s , 

W' 2 = (wl + wl)/p s , 

w' 3 = 2w 1 w 4 /p s , 

w' A = (wl + wl)/p s . (45) 

If initially > 1/2, then this procedure converges towards = 1. The convergence is 
slow, however, and since more than half of all the pairs is discarded each time, the yield is 
generally low. 

Suppose that instead of a product state we have an exchangeable state of the form (|TT|). 
We can carry out the procedure exactly as before, grouping the pairs into sets of two, with 
a target and control bit. If there are initially 2N pairs in the state 

p<™ = I dwp(w)pf N , (46) 



then after performing the measurements, Alice and Bob will get the same result N s times 
and different results N — N s times, leaving them with a new state of the form (46) for N s 



pairs. For large N, the posterior distribution p(w\N s ) will generally be sharply peaked about 
those w which give a value of p s close to N s /N. Unlike hashing, the recurrence algorithm 
produces a posterior state p( Na > which is exchangeable. We now turn to how we find this 
state in light of the measurement results. 

Compared with the hashing algorithm, where precisely one bit of information is obtained 
in each round of the procedure, in the recurrence method much more information is obtained, 
namely the value of iV s . We can therefore deduce the posterior distribution 

/ »r N P(N S \W)P(W) , , 

p <^> = P( Z • < 47 > 



where 



p(N s \w) = (^)[Ps(w)] Ns [l-Ps(w)] N - Ns , (48) 



and 
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p(N s ) = J dw p(N s \w)p(w) . (49) 

Because the remaining states have been transformed according to (f45[), we must also change 
to the new variables w' . So the new state is 

p (Afs) = J dw'p'(w')p(w') , (50) 

where 

p'(w') dw' = p(w\N s ) dw . (51) 

While this Bayesian procedure is very simple compared to the hashing method, it is still 
a bit too complicated for simple illustration. There is, however, an even simpler variant of 
this technique that is easy to analyze. Suppose that, instead of the general Bell-diagonal 
state (|10D , we have an initial Werner state 

p(F) =F$ + + ^-^($_ + *+ + *_) . (52) 

We can carry out the recurrence procedure exactly as above, with the probability of success 

Ps (F) = (8F 2 -4F + 5)/9; (53) 

here F denotes the fidelity of the state with with F > 1/2 necessary for distillability. 
The recurrence procedure does not in general lead to a new state of form ([52]) , but by twirling 
the state can be put in this form, at the cost of some increase in entropy. The new state 
has a fidelity 

F' = 10F2 ~ 2F+1 . (54) 
8F 2 - 4F + 5 K ' 

Suppose that we have 2N entangled pairs, with partial information sufficient to determine 
that they are all in a state of the form ([52]), but not to determine the exact fidelity F. The 
joint state of the pairs is then 

p m = J dFp(F)p(Ff 2N . (55) 

We then group the pairs into sets of two and carry out the recurrence procedure on each 
set, with N s successful results. We can then deduce a revised generating function 



where 



p(N s \F) = ( *)\p a (F)] N '[l -Ps(F)] N - N ° , (57) 



and 
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p(N s ) = J dFp(N s \F)p(F) . (58) 

The new density operator for the N s remaining pairs is 

= J dF'p'(F')p(F'f Ns , (59) 

where the the posterior distribution is expressed in terms of the new variable F' given by 
(0). Working this out explicitly, we get 

W =U-2 + 7 i2^_)™, (60, 

where F(F') is the inverse of fl5jf): 



= (l-2F-) + 3V6F--4^-l 
v ; 10-8F' V ; 

We can see how much information is gained by a single round of the recurrence method 
using this simplified version as an example. If the initial generating function is a uniform 
distribution, p(F) = 4/3 for 1/4 < F < 1, then for large N the posterior distribution 
is highly peaked after one round. We see this in Figure 1, where the prior and posterior 
distributions are shown for different values of iV and a typical choice of N s . Note that states 
with 1/4 < F < 1/2 move towards F — 1/4 under the procedure, producing a peak about 
the completely mixed state; for high N and the value of N s used in our example, this peak 
is suppressed by the Bayesian updating. States with F > 1/2 move towards F — 1. The 
procedure has fixed points at F — 1/4, F = 1/2, and F = 1. 

It should be noted that because of its extremely small yield, the recurrence method 
should never be used if hashing is possible. An initial state that cannot be distilled by the 
hashing method, however, might, after one or more rounds of the recurrence method, satisfy 
the criterion (|26|) for some value of So < 1. If that is so, then a combination of tomography 
and hashing should be used thereafter, as described in the last section. 

Similarly, if p(p) has some support on distillable and some on undistillable states, a 
few rounds of the recurrence method generally produces convergence on either a distillable 
or undistillable state, without ambiguity. Under certain circumstances, however, it might 
be beneficial to supplement this with tomographic measurements on a number of pairs as 
well. For example, the updating procedure (fi5D treats the coefficients Wi,W4 and W2,ws 
symmetrically. An initially symmetric state thus has this symmetry preserved, and the 
distribution p(w) might become double-peaked. In this case, measuring a small number of 
pairs would suffice to eliminate one of the two peaks. 

V. CONCLUSION 

We have given a Bayesian account of the entanglement purification procedures of one-way 
hashing and recurrence. The Bayesian formulation allows us to provide a straightforward 
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discussion of the conditions under which maximally entangled states can be distilled from 
unknown or partially known quantum states. For one-way hashing, we have given the a 
priori probabilities for the possible asymptotic yields of maximally entangled pairs. Our 
results can be used to decide which combination of quantum state tomography, recurrence, 
and hashing to use to obtain the highest expected yield, both asymptotically and in the case 
of a fixed number of initially given pairs. Although our discussion is entirely in terms of 
pairs of qubits, the method is general and can be applied to any generalization of hashing 
or recurrence in Hilbert spaces of higher dimension. 
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FIG. 1. Plots of an initially uniform distribution for the generalized Werner state for fidelities 
between F = 1/4 (maximally mixed) and F = 1 (maximally entangled) and updated distributions 
after one round of the simplified recurrence method. Before the round there are 2N pairs; we 
assume the procedure succeeds in 2N/3 cases (p s = 2/3). The new distribution is plotted for 
N = 9, 18, 48, 99. The new distribution is more and more highly peaked for bigger N, and the 
probability of unentangled states is more and more strongly suppressed. 
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